-
Notifications
You must be signed in to change notification settings - Fork 3
Fix constants and switch intrinsics to constant value inputs + tfuncs
#79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Extract Val type parameters directly via argextype at each intrinsic codegen site, rather than having get_constant implicitly unwrap Val and Constant type parameters. This keeps language-level concerns out of generic codegen infrastructure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0afd872 to
eb57664
Compare
|
Okay, I actually got In fact, I tried to take it one step further and introduce diff -ruN src/compiler/interface.jl src/compiler/interface.jl
--- src/compiler/interface.jl 2026-02-07 20:19:24.423839322 +0100
+++ src/compiler/interface.jl 2026-02-07 20:20:13.241318532 +0100
@@ -66,7 +66,31 @@
CC.unlock_mi_inference(::cuTileInterpreter, ::MethodInstance) = nothing
# Setup caching - generates cache_owner and ipo_dataflow_analysis! methods
-@setup_caching cuTileInterpreter.cache
+# Caching setup — replaces @setup_caching to add per-intrinsic effect overrides.
+# @setup_caching generates cache_owner + finish!; we define both manually so we
+# can modify ipo_effects in finish! before the base method encodes them into
+# the CodeInstance's ipo_purity_bits.
+CC.cache_owner(interp::cuTileInterpreter) =
+ CompilerCaching.cache_owner(interp.cache)
+
+function CC.finish!(interp::cuTileInterpreter, caller::CC.InferenceState,
+ validation_world::UInt, time_before::UInt64)
+ CC.stack_analysis_result!(caller.result, CuTileResults())
+ # Apply per-intrinsic effect overrides to ipo_effects before the base
+ # finish! encodes them into ipo_purity_bits on the CodeInstance.
+ specTypes = caller.linfo.specTypes
+ if specTypes isa DataType
+ ftype = specTypes.parameters[1]
+ if isdefined(ftype, :instance)
+ override = _efunc(ftype.instance, caller.result.ipo_effects)
+ if override !== nothing
+ caller.result.ipo_effects = override
+ end
+ end
+ end
+ @invoke CC.finish!(interp::CC.AbstractInterpreter, caller::CC.InferenceState,
+ validation_world::UInt, time_before::UInt64)
+end
# Optimization flags
CC.may_optimize(::cuTileInterpreter) = true
@@ -83,6 +107,15 @@
# Intrinsics module exists).
tfunc(@nospecialize(f), argtypes::Vector{Any}) = nothing
+# Per-intrinsic effect overrides using multiple dispatch.
+# Returns nothing when no override applies (fallback).
+# Concrete per-intrinsic methods are defined in intrinsics/ for
+# side-effectful operations (stores, atomics).
+_efunc(@nospecialize(f), effects::CC.Effects) = nothing
+
+# Check if a function is defined in the Intrinsics module.
+_is_intrinsic(@nospecialize(f)) = isa(f, Function) && parentmodule(f) === Intrinsics
+
#=============================================================================
Subprogram inference for reduce/scan
=============================================================================#
@@ -174,7 +207,7 @@
sv::CC.InferenceState, max_methods::Int)
rt_override = tfunc(f, arginfo.argtypes)
subprog = _infer_subprogram(interp, f, arginfo, si, vtypes, sv)
- rt_override === nothing && subprog === nothing && return result
+ rt_override === nothing && subprog === nothing && !_is_intrinsic(f) && return result
wrapped = CC.Future{CC.CallMeta}()
push!(sv.tasks, function (interp′, sv′)
isready(result) || return false
@@ -182,8 +215,10 @@
cm = result[]
sp = subprog !== nothing ? subprog[] : nothing
rt = rt_override !== nothing ? rt_override : cm.rt
+ effects_override = _efunc(f, cm.effects)
+ effects = effects_override !== nothing ? effects_override : cm.effects
info = sp !== nothing ? SubprogramCallInfo(cm.info, sp.info) : cm.info
- wrapped[] = CC.CallMeta(rt, cm.exct, cm.effects, info, cm.refinements)
+ wrapped[] = CC.CallMeta(rt, cm.exct, effects, info, cm.refinements)
return true
end)
return wrapped
@@ -197,7 +232,7 @@
sv::CC.InferenceState, max_methods::Int)
rt_override = tfunc(f, arginfo.argtypes)
subprog = _infer_subprogram(interp, f, arginfo, si, nothing, sv)
- rt_override === nothing && subprog === nothing && return result
+ rt_override === nothing && subprog === nothing && !_is_intrinsic(f) && return result
wrapped = CC.Future{CC.CallMeta}()
push!(sv.tasks, function (interp′, sv′)
isready(result) || return false
@@ -205,8 +240,10 @@
cm = result[]
sp = subprog !== nothing ? subprog[] : nothing
rt = rt_override !== nothing ? rt_override : cm.rt
+ effects_override = _efunc(f, cm.effects)
+ effects = effects_override !== nothing ? effects_override : cm.effects
info = sp !== nothing ? SubprogramCallInfo(cm.info, sp.info) : cm.info
- wrapped[] = CC.CallMeta(rt, cm.exct, cm.effects, info, cm.refinements)
+ wrapped[] = CC.CallMeta(rt, cm.exct, effects, info, cm.refinements)
return true
end)
return wrapped
@@ -220,9 +257,13 @@
sv::CC.AbsIntState, max_methods::Int)
_infer_subprogram(interp, f, arginfo, si, nothing, sv) # side-effect only
rt_override = tfunc(f, arginfo.argtypes)
- if rt_override !== nothing
- return CC.CallMeta(rt_override, result.exct, result.effects,
- result.info, result.refinements)
+ effects_override = _efunc(f, result.effects)
+ if rt_override !== nothing || effects_override !== nothing
+ return CC.CallMeta(
+ rt_override !== nothing ? rt_override : result.rt,
+ result.exct,
+ effects_override !== nothing ? effects_override : result.effects,
+ result.info, result.refinements)
end
return result
end
diff -ruN src/compiler/intrinsics/atomics.jl src/compiler/intrinsics/atomics.jl
--- src/compiler/intrinsics/atomics.jl 2026-02-07 20:19:24.428020571 +0100
+++ src/compiler/intrinsics/atomics.jl 2026-02-07 20:21:03.798317824 +0100
@@ -41,10 +41,11 @@
"""
@noinline function atomic_cas(array::TileArray{T, N}, index, expected, desired,
memory_order::Int, memory_scope::Int) where {T, N}
- donotdelete()
compilerbarrier(:const, zero(T))::T
end
end
+_efunc(::typeof(Intrinsics.atomic_cas), effects::CC.Effects) =
+ CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_cas), args)
cb = ctx.cb
tt = ctx.tt
@@ -179,10 +180,11 @@
"""
@noinline function atomic_xchg(array::TileArray{T, N}, index, val,
memory_order::Int, memory_scope::Int) where {T, N}
- donotdelete()
compilerbarrier(:const, zero(T))
end
end
+_efunc(::typeof(Intrinsics.atomic_xchg), effects::CC.Effects) =
+ CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_xchg), args)
emit_atomic_rmw!(ctx, args, AtomicXCHG)
end
@@ -198,10 +200,11 @@
"""
@noinline function atomic_add(array::TileArray{T, N}, index, val,
memory_order::Int, memory_scope::Int) where {T, N}
- donotdelete()
compilerbarrier(:const, zero(T))
end
end
+_efunc(::typeof(Intrinsics.atomic_add), effects::CC.Effects) =
+ CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_add), args)
emit_atomic_rmw!(ctx, args, AtomicADD)
end
diff -ruN src/compiler/intrinsics/memory.jl src/compiler/intrinsics/memory.jl
--- src/compiler/intrinsics/memory.jl 2026-02-07 20:19:24.429705165 +0100
+++ src/compiler/intrinsics/memory.jl 2026-02-07 20:21:09.960317738 +0100
@@ -95,10 +95,11 @@
@noinline function store_ptr_tko(ptrs::Tile{Ptr{T}, S}, values::Tile{T, S},
latency::Union{Int, Nothing},
mask::Union{Tile{Bool, S}, Nothing}=nothing) where {T, S}
- donotdelete()
- nothing
+ compilerbarrier(:const, nothing)
end
end
+_efunc(::typeof(Intrinsics.store_ptr_tko), effects::CC.Effects) =
+ CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.store_ptr_tko), args)
cb = ctx.cb
tt = ctx.tt
diff -ruN src/compiler/intrinsics/views.jl src/compiler/intrinsics/views.jl
--- src/compiler/intrinsics/views.jl 2026-02-07 20:19:24.431298839 +0100
+++ src/compiler/intrinsics/views.jl 2026-02-07 20:21:15.551317660 +0100
@@ -378,10 +378,11 @@
latency::Union{Int, Nothing},
allow_tma::Bool,
indices::NTuple{M, <:Integer}) where {T, N, Shape, M}
- donotdelete()
- nothing
+ compilerbarrier(:const, nothing)
end
end
+_efunc(::typeof(Intrinsics.store_partition_view), effects::CC.Effects) =
+ CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.store_partition_view), args)
cb = ctx.cb
tt = ctx.tt
diff -ruN src/compiler/intrinsics.jl src/compiler/intrinsics.jl
--- src/compiler/intrinsics.jl 2026-02-07 20:19:24.426082161 +0100
+++ src/compiler/intrinsics.jl 2026-02-07 20:20:45.284318084 +0100
@@ -4,7 +4,7 @@
module Intrinsics
-using Base: compilerbarrier, donotdelete
+using Base: compilerbarrier
using ..cuTile: Tile, TileArray, Constant, TensorView, PartitionView
using ..cuTile: Signedness, SignednessSigned, SignednessUnsigned
using ..cuTile: ComparisonPredicate, CmpLessThan, CmpLessThanOrEqual, CmpGreaterThan, CmpGreaterThanOrEqual, CmpEqual, CmpNotEqual |
We only use it for side-effects. So remove it from all pure intrinsics.
8cef2fb to
0cdb2aa
Compare
tfuncs
|
Idea: Instead of turning the |
We were conflating ghost values (e.g.
nothing) with non-ghost things that shouldn't be emitted (e.g.Valinstances). This came from a deeper issue where codegen had to unpackValandConstantwrappers, which shouldn't be required as these have language-level operations to unpack (typevars, getindex). To avoid codegen having to do this, make the intrinsics take constants instead of type-wrapped values. While touching this code, also switch constants to being emitted lazily.One disadvantage of this approach is that it relies heavily on constant-propagation at the Julia level to infer types from value inputs, hence the
@constprop :aggressive. I considered using explicittfuncsas an alternative, combined with intrinsics returningAny, but that didn't fully work because the inferred rettyp from the body kept leaking in. We also had to keep bodies for now because of JuliaLang/julia#60583. Maybe something to revisit later.EDIT: revisited; see below.
Fixes #77.